301 research outputs found

    Integrative Windowing

    Full text link
    In this paper we re-investigate windowing for rule learning algorithms. We show that, contrary to previous results for decision tree learning, windowing can in fact achieve significant run-time gains in noise-free domains and explain the different behavior of rule learning algorithms by the fact that they learn each rule independently. The main contribution of this paper is integrative windowing, a new type of algorithm that further exploits this property by integrating good rules into the final theory right after they have been discovered. Thus it avoids re-learning these rules in subsequent iterations of the windowing process. Experimental evidence in a variety of noise-free domains shows that integrative windowing can in fact achieve substantial run-time gains. Furthermore, we discuss the problem of noise in windowing and present an algorithm that is able to achieve run-time gains in a set of experiments in a simple domain with artificial noise.Comment: See http://www.jair.org/ for any accompanying file

    Optimal investment and location decisions of a firm in a flood risk area using impulse control theory

    Get PDF
    Flooding events can affect businesses close to rivers, lakes or coasts. This paper provides an economic partial equilibrium model, which helps to understand the optimal location choice for a firm in flood risk areas and its investment strategies. How often, when and how much are firms willing to invest in flood risk protection measures? We apply Impulse Control Theory and develop a continuation algorithm to solve the model numerically. We find that, the higher the flood risk and the more the firm values the future, i.e. the more sustainable the firm plans, the more the firm will invest in flood defense. Investments in productive capital follow a similar path. Hence, planning in a sustainable way leads to economic growth. Sociohydrological feedbacks are crucial for the location choice of the firm, whereas different economic settings have an impact on investment strategies. If flood defense is already present, e.g. built up by the government, firms move closer to the water and invest less in flood defense, which allows firms to generate higher expected profits. Firms with a large initial productive capital surprisingly try not to keep their market advantage, but rather reduce flood risk by reducing exposed productive capital

    Migration on request, a practical technique for preservation

    Get PDF
    Maintaining a digital object in a usable state over time is a crucial aspect of digital preservation. Existing methods of preserving have many drawbacks. This paper describes advanced techniques of data migration which can be used to support preservation more accurately and cost effectively. To ensure that preserved works can be rendered on current computer systems over time, “traditional migration” has been used to convert data into current formats. As the new format becomes obsolete another conversion is performed, etcetera. Traditional migration has many inherent problems as errors during transformation propagate throughout future transformations. CAMiLEON’s software longevity principles can be applied to a migration strategy, offering improvements over traditional migration. This new approach is named “Migration on Request.” Migration on Request shifts the burden of preservation onto a single tool, which is maintained over time. Always returning to the original format enables potential errors to be significantly reduced

    Factorizing LambdaMART for cold start recommendations

    Full text link
    Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Despite its success it does not have a principled regularization mechanism relying in empirical approaches to control model complexity leaving it thus prone to overfitting. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns a low rank latent representation of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. The low rank is essentially a model complexity controller; on top of it we propose additional regularizers to constraint the learned latent representations that reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. In both cases it outperforms in a significant manner current state of the art algorithms

    Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

    Get PDF
    Biomedical text tagging systems are plagued by the dearth of labeled training data. There have been recent attempts at using pre-trained encoders to deal with this issue. Pre-trained encoder provides representation of the input text which is then fed to task-specific layers for classification. The entire network is fine-tuned on the labeled data from the target task. Unfortunately, a low-resource biomedical task often has too few labeled instances for satisfactory fine-tuning. Also, if the label space is large, it contains few or no labeled instances for majority of the labels. Most biomedical tagging systems treat labels as indexes, ignoring the fact that these labels are often concepts expressed in natural language e.g. `Appearance of lesion on brain imaging'. To address these issues, we propose constructing extra labeled instances using label-text (i.e. label's name) as input for the corresponding label-index (i.e. label's index). In fact, we propose a number of strategies for manufacturing multiple artificial labeled instances from a single label. The network is then fine-tuned on a combination of real and these newly constructed artificial labeled instances. We evaluate the proposed approach on an important low-resource biomedical task called \textit{PICO annotation}, which requires tagging raw text describing clinical trials with labels corresponding to different aspects of the trial i.e. PICO (Population, Intervention/Control, Outcome) characteristics of the trial. Our empirical results show that the proposed method achieves a new state-of-the-art performance for PICO annotation with very significant improvements over competitive baselines.Comment: International Workshop on Health Intelligence (W3PHIAI-20); AAAI-2

    Aiding first incident responders using a decision support system based on live drone feeds

    Get PDF
    In case of a dangerous incident, such as a fire, a collision or an earthquake, a lot of contextual data is available for the first incident responders when handling this incident. Based on this data, a commander on scene or dispatchers need to make split-second decisions to get a good overview on the situation and to avoid further injuries or risks. Therefore, we propose a decision support system that can aid incident responders on scene in prioritizing the rescue efforts that need to be addressed. The system collects relevant data from a custom designed drone by detecting objects such as firefighters, fires, victims, fuel tanks, etc. The drone autonomously observes the incident area, and based on the detected information it proposes a prioritized based action list on e.g. urgency or danger to incident responders

    Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability

    Full text link
    Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce misleading and verbose results if the model is too complex, especially w.r.t. feature interactions. To quantify the complexity of arbitrary machine learning models, we propose model-agnostic complexity measures based on functional decomposition: number of features used, interaction strength and main effect complexity. We show that post-hoc interpretation of models that minimize the three measures is more reliable and compact. Furthermore, we demonstrate the application of these measures in a multi-objective optimization approach which simultaneously minimizes loss and complexity

    Modelling fish habitat preference with a genetic algorithm-optimized Takagi-Sugeno model based on pairwise comparisons

    Get PDF
    Species-environment relationships are used for evaluating the current status of target species and the potential impact of natural or anthropogenic changes of their habitat. Recent researches reported that the results are strongly affected by the quality of a data set used. The present study attempted to apply pairwise comparisons to modelling fish habitat preference with Takagi-Sugeno-type fuzzy habitat preference models (FHPMs) optimized by a genetic algorithm (GA). The model was compared with the result obtained from the FHPM optimized based on mean squared error (MSE). Three independent data sets were used for training and testing of these models. The FHPMs based on pairwise comparison produced variable habitat preference curves from 20 different initial conditions in the GA. This could be partially ascribed to the optimization process and the regulations assigned. This case study demonstrates applicability and limitations of pairwise comparison-based optimization in an FHPM. Future research should focus on a more flexible learning process to make a good use of the advantages of pairwise comparisons

    Impact of tumor size and tracer uptake heterogeneity in (18)F-FDG PET and CT non-small cell lung cancer tumor delineation.: 18F-FDG PET and CT tumor delineation in NSCLC

    Get PDF
    International audienceUNLABELLED: The objectives of this study were to investigate the relationship between CT- and (18)F-FDG PET-based tumor volumes in non-small cell lung cancer (NSCLC) and the impact of tumor size and uptake heterogeneity on various approaches to delineating uptake on PET images. METHODS: Twenty-five NSCLC cancer patients with (18)F-FDG PET/CT were considered. Seventeen underwent surgical resection of their tumor, and the maximum diameter was measured. Two observers manually delineated the tumors on the CT images and the tumor uptake on the corresponding PET images, using a fixed threshold at 50% of the maximum (T(50)), an adaptive threshold methodology, and the fuzzy locally adaptive Bayesian (FLAB) algorithm. Maximum diameters of the delineated volumes were compared with the histopathology reference when available. The volumes of the tumors were compared, and correlations between the anatomic volume and PET uptake heterogeneity and the differences between delineations were investigated. RESULTS: All maximum diameters measured on PET and CT images significantly correlated with the histopathology reference (r > 0.89, P < 0.0001). Significant differences were observed among the approaches: CT delineation resulted in large overestimation (+32% ± 37%), whereas all delineations on PET images resulted in underestimation (from -15% ± 17% for T(50) to -4% ± 8% for FLAB) except manual delineation (+8% ± 17%). Overall, CT volumes were significantly larger than PET volumes (55 ± 74 cm(3) for CT vs. from 18 ± 25 to 47 ± 76 cm(3) for PET). A significant correlation was found between anatomic tumor size and heterogeneity (larger lesions were more heterogeneous). Finally, the more heterogeneous the tumor uptake, the larger was the underestimation of PET volumes by threshold-based techniques. CONCLUSION: Volumes based on CT images were larger than those based on PET images. Tumor size and tracer uptake heterogeneity have an impact on threshold-based methods, which should not be used for the delineation of cases of large heterogeneous NSCLC, as these methods tend to largely underestimate the spatial extent of the functional tumor in such cases. For an accurate delineation of PET volumes in NSCLC, advanced image segmentation algorithms able to deal with tracer uptake heterogeneity should be preferred

    Multi-score Learning for Affect Recognition: the Case of Body Postures

    Get PDF
    An important challenge in building automatic affective state recognition systems is establishing the ground truth. When the groundtruth is not available, observers are often used to label training and testing sets. Unfortunately, inter-rater reliability between observers tends to vary from fair to moderate when dealing with naturalistic expressions. Nevertheless, the most common approach used is to label each expression with the most frequent label assigned by the observers to that expression. In this paper, we propose a general pattern recognition framework that takes into account the variability between observers for automatic affect recognition. This leads to what we term a multi-score learning problem in which a single expression is associated with multiple values representing the scores of each available emotion label. We also propose several performance measurements and pattern recognition methods for this framework, and report the experimental results obtained when testing and comparing these methods on two affective posture datasets
    corecore